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(57) Abstract 

A method of exon amplification, which 
is useful for fast and efficient isolation of a 
coding sequence from complex manunalian 
genomic DNA. Framents of genomic mam- 
malian DNA are inserted into an intron con- 
tained within a splicing plasmid, resulting in a 
splicing plasmid construct The construct is in- 
troduced into an appropriate host cell, result- 
ing in replication of and transcription from 
the construct The transcripts are processed in- 
to mature RNA. If an exon is present in the 
genomic DNA fragment contained in the plas- 
mid intron, the splice sites of the DNA insert 
can be paired with 5' and 3' splice sites pro- 
vided in the splicing construct Mature RNA, 
which contains transcribed exons from the 
genomic DNA, is isolated, amplified via 
RNA-based PCR, and subsequently cloned. 
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METHOD OF EXON AMPLIFICATION 
Description 

Background 

Understanding the molecular basis of human 

05 genetic disorders and corresponding genotypes in 
other mammalian genomes requires methods for the 
identification of coding sequences in target 
chromosomal regions. Current methods which are used 
for this are both inefficient and tedious « The 

10 strategy used most frequently involves the screening 
of short genomic DNA segments for secpiences which 
are evolutionarily conserved (Monaco, A. P. et al> . 
Nature (London) 323 x646-650 (1986); Page, D.C. et 
al, . Cell 51; 1091-1104 (1987); Rommens, J.M. et al., 

15 Science 245:1059-1065 (1989); Call, K.M. et al,. 
Cell 60 :509-520 (1990). Although successful, there 
are several limitations to this approach* Sequences 
contained in aiRNA do not necessarily show such 
cross-species conservation. To avoid interference 

20 with repetitive sequences, short segments of cloned 
genomic DNA must be tested individually as probes in 
Southern blotting experiments, a time consuming 
procedure. Finally, this standard approach does not 
directly lead to identification of transcribed 

25 segments of the cloned genomic DNA, but requires 
isolation of cDNAs corresponding to the transcribed 
r gion. This assumes Icnowledge of specific tissue 
expr ssion f a given transcript, as well as 
adecpiate expression lev Is in the RNA used for 

30 construction of the cDNA library. Alternative 
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strategies for gene isolation involve sec[uencing and 
analysis of large segments genomic DNA for the 
presence of open reading frames (Fearon, E.R. et 
al. , Science 247:49-56 (1990), and cloning of 

05 hypomethylated Cp6 islemds, which are signposts of 
5'-ends of transcription units (Bird, A. , Natmre 
(London) 321:209-213 (1986). These methods also do 
not provide a direct meems of purifying coding 
sequences from genomic DNA. At present, an 

10 efficient, sensitive method of isolating coding 

sequences from complex genomic DNA is not available. 

Summary of the Invention 

The present invention relates to a method of 
exon amplification, which is useful for fast and 

15 efficient isolation of a coding sequence from 

complex mammalian genomic DNA. The present method 
is based upon the ability of RNA transcripts to 
undergo processing events which require the presence 
of splice recognition and branchpoint sequences. In 

20 the present method of isolating a coding sequence or 
exon from mammalian genomic DNA, fragments of 
genomic mammalian DNA are, inserted into an intron 
contained within an in vivo splicing plasmid, 
resulting in production of an in vivo splicing 

25 plasmid construct, in which mammalian genomic DNA is 
present within a plasmid intron. The constructs are 
introduced (e.g., transfected by electroporation) 
into an appropriate host cell type, such as COS7 
cells, in which they replicate. Host cells 

30 containing the in vivo splicing plasmid constructs 
are maintained under conditions appropriate for 
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replication of and transcription from the in vivo 
splicing plasmid constructs, resulting in production 
and in vivo processing of KNA in the host cells. If 
an exon is present in the genomic DNA fragment 

05 contained within the plasmid intron, the vector 

splice sites are paired with splice sites of the DNA 
insert. The resulting mature RNA contains the 
previously unidentified exons (is a mature SNA 
transcript of the genomic DNA) . The mature RNA 

10 containing the exons is isolated and can then be 
amplified via SNA-based PCS, and subsequently 
cloned. Alternatively, the PCS product can be 
purified and sequenced directly. For example, 
mature SNA obtained from host cells in which in vivo 

15 plasmid constructs have replicated can be screened 
with an appropriate probe, such as an anti-sense 
oligonucleotide to identify mature SNA transcripts 
of the exon. SNA transcripts of the exon are 
amplified using an SNA-based amplification method 

20 (e.g., SNA-based PCS), thereby producing cDNA of the 
mammalian genomic DNA. cDNA produced in this manner 
is purified (to separate it from other cDNA present) 
and the purified cDNA is directly sequenced or 
digested with at least one restriction enzyme which 

25 recognizes a restriction site or sites present in 
the splicing vector or in the PCS product. The 
digested purified cDNA is introduced into the 
cloning vector (e.g., pBluescript/llSK+, Stratagene) 
and cloned. The cloned DNA can be sequenced using 

30 known methods. 

In one embodiment of the present invention, 
genomic fragment (s) having compatible ends is cloned 
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into the in vivo splicing plasmid pSPLl at a BanHI 
site created within the HIV-tat intron (See Figure 
la) • The clone is then trsmsiently transfected by 
electroporation into C0S7 cells, after which 
05 amplification of the plasmid occurs by virtue of the 
SV40 origin of replication. High levels of 
transcription are facilitated by the SV40 promoter, 
resulting in production of RNA containing the 
iistroduced genomic sequence. If the genomic 
10 sequence contains an exon in the proper orientation, 
processing occurs in such a manner that the exon is 
retained in the mature BNA, f Ismked by HIV tat and 
^-globin exon sequences. Subsequently, cytoplasmic 
KIA is isolated from the COS? cells, and subjected 
15 to RNA-based PGR analysis using oligodeoxynucleo- 
tides which hybridize to the flanking /?-globin 
sequences. The amplified product contains the 
introduced exon sequence, and can be analysed by 
cloning and sequencing or by direct PGR sequencing. 
20 The present method has been used to isolate 

exon seqpiences from cloned genomic fragments known 
to contain exon sequences (i.e., fragments of a 
mouse clone known to contain exon sequences of the 
murine Na,K-ATPase al subunit gene) . The present 
25 method has* also been used to isolate exon sequences 
of a specific gene (i.e., the DNA repair ger^, 
ERCCl) from randomly selected genomic clones known 
to be derived from a segment of human chromosome 19. 
The sensitivity and ease of the exon amplification 
30 method is such that 20-40 kbp of genomic DNA have 
been screened in a single transfection. The subject 
m thod can be used for rapid identification of 
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transcribed segments of inaininalian genomes and the 
generation of chromosomal transcription maps. 

Brief Description of the Dravings 

Figure 1 is a schematic representation of the 
05 structure of the in vivo splicing plasmid pSPLl and 
a schematic representation of the exon amplification 
method of the present method. 

Figure 2 shows the results of sequence analysis 
of amplified product derived from X genomic clone 
10 5W. 

Detailed Description of the Invention 

The present invention relates to a method of 
isolating exon sequences from genomic DNA in which 
exon sequences are rescued by virtue of selection 

15 for functional 5' and 3' splice sites. The method 
of the present invention, referred to as exon 
amplification, is useful for isolating coding 
sequences from complex mammalian genomic DNA and for 
screening 20-40 kbp of genomic DNA in a single 

20 transfection. It is based upon the ability of RNA 
transcripts to vmdergo processing events which 
require the presence of splice recognition and 
branchpoint sequences. In the present method, 
random segments of chromosomal DNA are inserted into 

25 an intron present within an in vivo splicing 
plasmid, the resulting in vivo splicing plasmid 
construct is transfected into an appropriate 
mammalian host cell, and the transfected host cell 
is maintained under conditions appropriate for 

30 replication of the in vivo splicing plasmid 
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construct* Transcription from the promoter of the 
splicing construct, followed by RNA processing 
results in production of cytoplasmic mRNA, which is 
screened by PGR amplification for the acquisition of 

05 an exon from the genomic fragment. In particulcu:, 
as described herein, fragments of cloned genomic DNA 
axB inserted into the Intron contained within an in 
vivo splicing plasmid and the resulting construct is 
introduced into an appropriate host cell (one in 

10 which the in vivo splicing plasmid construct can 
replicate or be amplified) . If the fragment 
introduced into the vector contains an exon, the 
vector splice sites are paired with splice sites of 
the inserted fragment. The resulting mature 

15 cytoplasmic RNA contains the previously unidentified 
exons, which are cloned by known techniques, for 
example by means of HNA*based PGR amplification and 
cloning. 

Previous studies have shown that introns 
20 constructed with novel combinations of 5^ and 3' 
splice sites from diverse genes are actively 
spliced. Thus, this method is generally applicable 
for the selection of exon sequences from emy gene. 
The method is also rapid and easily adapted to large 
25 scale experiments. A series of cloned genomic DNA 
fragments can be screened within one to two weeks. 
The sensitivity of this method is high. Genomic DNA 
segments of 20 Icb or more can be successfully 
screened in a single trans feet ion using a set of 
30 pooled sxibclones. This method thus allows the rapid 
identification of exons in mammalian genomic DNA and 
sh uld facilitate the isolation of a wide spectrum 
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of genes of significance in physiology and 
development. 

In the present method, genomic DNA from any 
mammalian cell type can be analyzed for the presence 

05 of exons. DNA to be analyzed is obtained (e.g., 
from a cell or cloned genomic DNA, such as a cosmid 
library) and fragmented, using known methods and 
introduced into an appropriate in vivo splicing 
plasmid, such as the pSPLl vector described herein. 

10 DNA to be introduced into the in vivo splicing 
plasmid is digested with appropriately-selected 
restriction enzyme(s), (i.e., restriction en2yme(s) 
which make it possible for the resulting fragments 
to be introduced into an insertion or cloning site 

15 or sites in a plasmid intron in the in vivo splicing 
plasmid) • For example, in the case of pSPLl, DNA is 
digested with BamHI, resulting in DNA fragments 
which can be inserted into the unique BamHI site in 
the plasmid intron (see Figxxre 1). The resulting 

20 fragments are subcloned into the in vivo splicing 
plasmid at the insertion site and the resulting in 
vivo splicing plasmid construct is introduced (e.g., 
using transfection by electroporation) into an 
appropriate host cell, in which replication or 

25 amplification of the plasmid occxirs. RNA containing 
the introduced genomic fragment is produced and is 
processed in such a way that the mature SNA (mHNA) 
includes the exon. 

The pSPLl vector is representative of in vivo 

30 splicing plasmids useful in the present method of 
exon amplification. Other plasmid vectors which 
contain a promoter, such as the SV40 or CMV 
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promoter, whose presence and function in the in vivo 
slicing plasmid results in high levels of 
transcription, and which contain a plasmid intron 
into which a DHA fragment can be inserted, including 
05 splice junction or splice sites, and a branchpoint 
recognition sequence can be used. Any convenient 
insertion site or sites present in the intron and 
located 5' to the branchpoint recognition sequence 
will be useful for efficient splicing of the exon 
10 introduced into the splicing plasmid. In one 

embodiment, the splicing vector can be engineered 
using known techniques to introduce a multiple 
cloning site into the splicing vector's intron, 
preferably 5' of the branchpoint sequence, to 
15 increase the available convenient restriction sites. 
Particulearly useful are vectors which include an 
origin of replication under the control of whicdi 
amplification occurs in the host cells, increasing 
the number of templates for transcription. A 
20 bacterial origin of replication and selectsJ^le 

marker are useful for propagation of the plasmids in 
bacterial hosts, and a mammalian polyadenylation 
signal is useful for message stability, efficient 
processing and tremsport of transcripts in the 
25 mammalian host cell. 

Splice junctions or splice sites, such as those 
of the HIV-1 tat gene present in pSLPl, which are 
not as efficient as others, are also paurticularly 
useful. The 5' and 3' splice site seqniences 
30 flanking the pSPLl vector exons, which were selected 
to minimize exon skipping, are a particular 
advantage of the pSPLl vector. These splice sites. 
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derived from the tat exons of HIV-1, are slowly 
spliced in both in vivo and in vitro systems. The 
inefficient splice sites of tat are therefore 
compatible for reactions with splice sites from 

05 unrelated genes, and have been shown to be 

efficiently spliced to sites flanking the exons of 
the rat preproinsulin and the rabbit /8-globin genes* 

The transcribed sequences flanking the intron 
in the splicing vector are preferably of known 

10 sequence to facilitate selection of appropriate 
primers for amplification. In addition, these 
sequences preferably do not contain repetitive 
sequences, termination signals or additional 
functional splicing signals, which may interfere 

15 with joining of the genomic exon sequences with the 
desired 5' and 3' splice sites flanking the splicing 
vector's intron. 

In a further embodiment, the splicing vector 
can be modified such that the products resulting 

20 from self-splicing (i.e. joining of the vector's 5' 
and 3' splice sites together, rather than to genomic 
sequences) can be minimized prior to amplification 
by destruction of those templates. For example, the 
region immediately adjacent to the splice 

25 recognition sites can be altered such that, when the 
vector sequences are spliced together instead of to 
a genomic secpience (e.g. when there is no insert, or 
when the genomic exon is skipped), a restriction 
endonuclease site is created in the cDNA product of 

30 the transcript. Following the synthesis of double 
stranded cDNA from isolated mature RNA and prior to 
amplification, digestion with the appropriate 
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restriction enzyme cleaves these templates. Because 
the cleavage site is between the primer binding 
sites, the double stranded amplification product 
from these templates is minimized or abolished. For 
05 example, a restriction enzyme, such as BstXI, which 
recognizes an interrupted palindrome can be used. 
This enzyme recognizes the sequences CCA and T66 
separated by six nucleotides of unspecified 
sequence, thus, the splice recognition sequences can 
10 be accommodated in the unspecified nucleotides* 

Xhis process can result in an enrichment of products 
containing a mammalian exon, and increases the 
sensitivity of the procedure in detecting such 
exons. Thus, the amount of genomic DNA that can be 
15 screened for exons in one step is increased 

substantially. For example, YAC clones with inserts 
some 200-500 kb can be accommodated by increasing 
the sensitivity in this manner. 

Cytoplasmic RNA produced in host cells 
20 containing the in vivo plasmid construct is isolated 
from the host cells and svibj acted to amplification 
(e.g., by RNA-based PGR, as described in Example 1). 
Following the reverse trsmscription and amplifica- 
tion reactions resulting in DNA production, the 
25 RNA/PCR product is cloned by purifying the appro- 
priate DNA product, digesting it with appropriate 
restriction enzymes (e.g.. Sail and KscI) and 
cloning it into appropriate restriction sites (e.g., 
Sail and EcoRV sites, respectively) in an appropri- 
30 ate plasmid (e.g., pBluescript/llSK'^, Stratagene). 
Cloned products are then sequenced, using Icnown 
techniques, such as the dideoxy chain termination 
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laethod* Purification of the PGR product is not 
necessary prior to cloning, when a suitable probe 
for identification of constsnicts containing genomic 
exons is available. The PGR products can also be 

05 sequenced directly without cloning. For example, in 
one convenient approach, the appropriate PGR 
fragment can be purified, reeunpllf led with primers 
adjacent to the vector splice sites (and, therefore, 
adjacent to the genomic exon) and sequenced. 

10 The method of exon amplification described 

herein is a rapid and efficient technique for the 
identification of expressed DNA sequences in complex 
mammalian genomes. This method circumvents the 
laborious characterization of a cloned genomic DNA 

15 segment and permits a direct transition to a cDNA. 
The initial need for appropriate sources of RNA for 
isolation of cDNA clones is thus also circumvented. 
The efficacy of exon amplification has been clearly 
demonstrated, as described particularly In the 

20 Examples, by the identification and cloning of exons 
from a cosmld Icnown to contain a portion of the 
mouse Na,R-ATPase al subunlt gene, as well as exon 
sequences of the human DNA repair gene, ERCCl, from 
an unchaoracterized X genomic clone. The present 

25 method of exon amplification can be used, for 
example, for rapidly determining the tissues in 
which a particular gene is expressed, either by 
northern analysis or by in situ hybridization. It 
can also be of use in the isolation of complete 

30 cDNAs by library screening . procedures or by 

anchored-PCR techniques (Loh, E.Y., et al. , Science 
243x217-220 (1989) . 
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Exon eunplification can also be used 1;o comple*- 
Bent recently developed methods for isolating 
transcribed segments of the human genome. Two such 
methods Involve the use of human-hamster hybrid cell 
05 lines containing specific regions of the human 

genome. The first approach involves the generation 
of a cDNA library from heteronuclear (hn) SNA using 
oligonucleotides complementary to consensus 5' 
intron splice sequences (Duyk, G.M. , et al, , Proc, 

10 Natl, Acad, Sci- USA 87x8995-8999 (1990), followed 
by screening for hxman-specif ic repetitive 
sequences. The second method utilizes oligonucleo- 
tides from the conserved region of Alu repetitive 
sequences to generate cDNAs from hnRNA by PGR. 

15 (Corbo, L. et al .. Science 249: 652-655 (1990)). 
The efficacy of both of these strategies is, 
however, dependent on significant expression of 
human-specific RNA in the hybrid cells. The 
products derived from these approaches must also be 

20 made free of repetitive sequences prior to use as 
probes for screening or blotting experiments, due to 
the presence of repetitive sequences in the introns 
of these imspliced ENAs. This problem cam be 
resolved by the application of exon amplification, 

25 using cDNAs from hnHNA as stcnrting material. Since 
these cDNAs represent cloned tremscription units, 
the combination of these approaches should greatly 
facilitate the cloning of coding sequences. 

Use of the present method of exon amplification 

30 in large scale screening for transcribed sequences 
may provide a new approach to genetic mapping. For 
instance, the construction of transcription maps for 
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large segments of manmalian genomes is technically 
feasible using this method. Such an approach could 
provide a powerful adjunct in the fine mapping of 
the human genome, and would enhance the efficiency 
05 with which genes responsible for nvunerous genetic 
disorders are identified. 

The nature of the sequence and structure 
specificity underlying the selection of exons during 
the splicing of normal nuclear precursor SNAs is not 
10 well understood. This specificity is sufficient to 
screen introns over 100,000 nucleotides in length in 
the accurate joining of the flanking exons. Experi- 
ments suggest that this remarkable specificity is 
not dictated by the unique nature of the two exons 
15 flanking an intron. in fact, all 5' and 3' splice 
sites are thought to be generically compatible for 
accurate splicing. This is typified by the accurate 
splicing of a hybrid intron where the 5' splice site 
is derived from an axon of the rat preproinsulin 
20 gene and the 3' splice site is from a viral exon. 
These results suggest that the present exon ampli- 
fication method should be able to identify most of 
the exons within a genomic fragment. 

The present method will now be illustrated by 
25 the following examples, which are not intended to be 
limiting in any way. 

.EXAMPLE 1 Isolation and Cloning of Exon Sequences 
from Indi vidual Segments of Genomic DNA 
Fragments of a mouse cosmid clone, MaG#9, known 
30 to contain exon sequences of the Na, K-ATPase al 
subunit gene (Tam, S.-Y., et al^, Mol. Cell. Biol. 



wo 92/13071 



PCr/US92/00692 



-14- 



10:6619-6623 (1990) were used as described below to 
demonstrate the ability of the present method to 
amplify exons from complex mammalian genomic DNA« 
Below is a description of the materials and methods 
05 used throughout the work described herein, followed 
by a description of their use with the mouse clone 
to identify the specific subimit gene. 

Cell Culture and Electroporation 

C0S7 cells (clonal line A6) were propagated in 

10 DME medium supplemented with 10% inactivated fetal 
calf serum. For transfections COS? cells were grown 
to 75-85% confluency, trypsinized, collected by 
centrifugation, and washed in ice-cold phosphate 
buffered saline (PBS) in the absence of divalent 

15 cations. The washed cells (approximately 4 x 10^) 
were then resuspended in cold 0.7 ml PBS €uid 
combined in a preceded electroporation cuvette (0.4 
cm chamber. Bio Rad) with 0.1 ml PBS containing 1-15 
fig DNA. After 10 minutes on ice the cells were 

20 gently resuspended, electroporated (1.2 kVt3 kV/cm] , 
25 fit) in a Bio Rad Gene Pulser, and placed on ice 
again. After 10 minutes the cells were transferred 
to a tissue culture dish (100 mm) containing 10 ml 
prewaunned, preequilibrated culture medium. 

25 Vector Construction and Oligonucleotides 

pSPLl was constructed as follows: A 2.7 kbp 
TagI fragment from pgTat (corresponding to nucleo- 
tides 68-2775 of HIV isolate HXB3) (Malim, H.H. et 
al. . Nature (London) 335 ; 181-183 (1988) was cloned 

30 into a Sail site of pBluescript+ (Stratagene) . A 
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2.6 )cbp BainHI-PstI fragment was isolated from this 
construct and used to replace the BamHI-EcoRI region 
of pSj5-IVS2 (Buchman, A. et al^, Mol. Cell . Biol> 
8:4395-4405 (1988), a shuttle vector containing the 

05 SV40 origin of replication and early region promoter 
upstream of rabbit /5-globin sequences, including 
^-globin IVS2. This results in removal of jj-globin 
IVS2 and addition of HIV-tat intron and flanking 
exon sequences. The EcoRI and PstI sites were 

10 removed by blunt-end cloning- The BamHI site in 
this construct was subsequently removed by BamHI 
digestion followed by blxmting with mxing beetn 
nuclease. Finally, a BamHI site was inserted into 
the HIV-tat intron at the unique Kpnl site. 

15 Oligonucleotide pairs and the predicted lengths of 
the PCR products generated by spliced UNA from the 
vector are: DHA15 , CCA6T6AG6A6AA6TCTGC6G and 
DHAB14,6T6AGCCA666CATT66CC:689 bp product; 
SD2,6TGAACTGCACTGTGACAA6C and 

20 SA2,ATCTCAGTGGTATTTGTGAGC: 429 bp product; 

SD1,CCCGGATCCGCGC6ACGAAGACCTCCTCAAGGC (BamHI cloning 
site at 5 '-end) and 

SA1,CCCGTCGACGTCGGGTCCCCTCGGGATTGG (Sail cloning 
site 5 '-end): 102 bp product. The antisense oligo- 
25 nucleotide (DHAB14 and SA2) were used as primer in 
the first strand cDNA synthesis reactions (see 
below) . SDl and SAl are internal to the initial 
BNA/PCR product and were used for reamplif ication of 
RNA/PCR products. 
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RNA Isolation. RNA/PCR Amplifications, and Clonlncy 
Cytoplasmic RNA was isolated 48-72 hours post- 
transfection. Briefly, cells were washed three 
times with ice cold PBS, scraped in 10 ml ice cold 
05 PBS, and collected by centrifugation. The cell 
pellet was then resuspended on ice in a low ionic 
strength buffer (10 mM Tris pH 7.5, i mM KCl, l mM 
Hgci^) and cells were lysed by the addition of 0.5% 
(of) Tfiton X-IOO. Nuclei were removed by centri- 
10 fugation, 0.5% SDS (cf) was then added to the 

supernatant, which was subsequently extracted with 
Tris-buffered phenol followed by phenol: chloroform 
(1:1). RNA was precipitated by the addition of 0.2 
M NaCl (cf) and 2.5 volumes of ethanol, followed by 
15 storage at -20'C. SNA was guantitated by Ajgj, 
determination . 

First strand cDNA synthesis was performed as 
follows: RNA (2.5 or 5 /tg) was added to a reverse 
transcription solution consisting of lo mM Tris pH 
20 8.3, 50 mM KCl, 1.5 mM MgCl^, 0.001% gelatin, and 
the mixture was heated to 65 -C for 5 minutes. 3.5 u 
RNasin (Promega) and 200 U MMLV reverse transcrip- 
tase (BRL) were added to the reaction (final volume: 
25 ^1) , which was then incubated at 42 •C for 90-120 
25 minutes. 

The entire reverse transcription reaction was 
then subjected to PGR amplification in a Thermo- 
cycler (Perkin-Elmer-Cetus) using the appropriate 
oligonucleotide pairs. Thirty-five amplification 
30 cycles were routinely used, and consisted of 1 

minute at 94 "C, 2 minutes at 55-58 -c and 3 minutes 
at 72 'C. Products were visualized by staining with 
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ethidium bromide after electrophoresis in 1-1.5% 
agarose gels. 

To clone the RNA/PCR product, the appropriate 
DNA fragment was purified from low melting point 

05 agarose and digested with Sail and HscI, xmique 
restriction sites present within the vector 
(jS-globin) sequences, and cloned into the Sail and 
EcoRV sites of pBluescriptllSK+ (Stratagene) . 
Alternatively, the gel-pmrified product was sub- 

10 jected to a second PGR amplification using the 
internal oligonucleotide pair SDl and SAl, which 
flank the vector splice junctions and contain BamHI 
and Sail cloning sites, respectively. The product 
from this reaction was gel purified, end-repaired 

15 with T4 DNA polymerase (New England Biolabs) , 
digested with B2unHI and Sail, and cloned into 
pBluescriptllSK+. Cloned products were sequenced 
using the dideoxy chain termination method (Sanger, 
F. et al. , Proc. Natl. Acad. Sci. USA 74:5463-5467 

20 (1977) . 

Blot analysis 

Restriction endonuclease digested genomic DNA 
clones were electrophoresed through 0.8% or 0.9% 
agarose gels. RNA/PCR products were electrophoresed 

25 through 1-1.5% agarose gels. RNA samples were 

electrophoresed through a 1% agarose, 6% formalde- 
hyde gel and blotted onto a Genescreen Plus membrane 
(New England Nuclear) . Filters were hybridized 
using standeurd procedures. Sambrook, J. et al. , 

30 Molecular Cloning; A Laboratory Manual (2nd ed.). 
Cold Spring Harbor Laboratory Press (1989) . DNA 
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probes were radiolabelled to high specific activity 
using the random priner method (Feinberg, A«P. et 
al, , Biochem« Biophys. Res. Commun. 111 :47-54 
(1983) . 

05 The strategy for exon amplification is outlined 

in Figure 1. A vector, designated pSPLl, was 
designed to make it possible to insert meunmalian 
genomic DNA segments approximately 1 to 4 kbp long 
at a Bamm site present within an intron in the 

10 vector. Construction of this vector is described in 
detail in Example 1. The insertion site is within 
an intron from the HZV-1 tat gene whose flanking 
exons and splice sites were substituted for the 
second intron of the rabbit jS-globin gene. The 

15 reporter gene is transcribed by the SV40 eairly 
promoter and a polyadenylation signal is derived 
from SV40. Upon transfection of this plasmid 
construct into C0S7 cells, RNA transcripts are 
efficiently generated and the tat intron sequences 

20 aire spliced to produce a polyadenylated cytoplasmic 
SNA. 

When a fragment containing an entire exon with 
flanking intron sequence (in the proper orientation) 
is inserted into the BamHI site of the pSPLl vector, 
25 the exon should be retained in the mature poly A+ 
^toplasmic SNA. 

Isolation and Cloning of Exon Sequences of the 
Ha^K-ATPase al subunit gene 

A Icnown source (fragments of a mouse cosmid 
30 clone, MaG#9} of exon sequences of a gene (the 

Na,K-ATPase al subvmit gene (Tam, S.-Y. _t al. , Mol. 
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Cell, Biol. 10; 6619-6623 (1990)) was used to demon- 
strate that when a fragment containing an entire 
exon with flanking intron sequence (in the proper 
orientation) is inserted into an intron of an in 

05 vivo splicing plasmid, the exon is retained in the 
nature poly a"*" cytoplasmic RNA. Fragments of the 
mouse cosmid clone were subcloned into pSPLl. A 3.5 
kbp Bglll fragment of the cosmid was subcloned into 
the BamHl site of pSPLl in sense and antisense 

10 orientations and the resulting constructs were 
introduced by transf action into C0S7 cells. 
Cytoplasmic RNA preparations derived from the 
transfectants were analyzed by northern blotting, 
using a radiolabeled NcoI-BamHI fragment spanning 

15 nucleotides 111-705 of the mtirine Na, K-ATPase al 
cDNA as probe (Kent, R.B., et al. , Science 
237 :901-903 (1987)). An abundant 2.2 kb RNA species 
was detected only in cells transf ected with the 
sense construct, indicating expression and 

20 processing of the transf ected sequences. 

To isolate spliced exons contained within the 
vector-derived RNA sequences, an RNA-based PGR 
(RNA/PCR) method was used. Cosmid Hoe6#9 was 
digested with either BamHI or Bglll, or with the 

25 combination of these endonucleases, followed by 

"shotgun" cloning into pSPLl. These constructs were 
then screened for the presence of exon sequences as 
described above, using jS-globin specific oligode- 
oxynucleotides (oligonucleotides SD2 and SA2) as 

30 RNA/PCR primers. The sense and antisense constructs 
described in Figure la were similarly analyzed. The 
resulting RNA/PCR products were visualized by 
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electrophoresis through 1*5% agarose gels and 
staining with ethidiun bromide. As expected, 
oligodeo3cynucleotide primers SD2 and SA2 generated 
an BNA/PCR product of 429 bp from KNA of transfec-* 
05 tants vith the pSPLl vector. The product migrating 
at 429 bp is derived from splicing occurring between 
vector 5' and 3' splice sites. A --300 bp product is 
present in all lanes, including mock-trsmsfected (no 
DNA) cells, indicating that this product is an 

10 artifact derived from the C0S7 cell background • 

Analysis of RNA from COS? cells trcmsfected with the 
3.5 kbp Bglll fragment inserted in the sense orien- 
tation into pSPLl yielded a PGR product of approxi- 
mately 1.5-1.6 kbp. Transfection of a recombinant 

15 containing the same fragment in the opposite orien- 
tation only yielded the 429 bp PCR product con- 
taining vector sequences. Hybridization of the 
mouse al subunit cDNA to blots containing these 
SNA/PCR products confirmed that sequences derived 

20 from the sense construct consist of ATPase exons. 
"Sense" and "antisense" RNA/PCR products from an 
experiment simileo: to that described ed:>ove were 
blotted and hybridized to the Ncol-BamHI fragment 
probe. The larger size of the product detected in 

25 the "sense" lane (approximately 1.8 kbp), when 

compared to the other product generated, is due to 
use of the oligonucleotide pair DHAB14 and DHAB15 in 
the BNA/PCR reaction, which will amplify 689 bp of 
vector sequence. The length and restriction pattern 

30 of the RNA/PCR product derived from sense trans- 
f ectants are consistent with proper splicing of six 
Al^ase exons. 
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A more detailed analysis on the RNA/PCR product 
generated by a 2.8 kbp Bglll fragment from the MaG#9 
cosmid was also performed. Insertion of this 
fragment into pSPLl and transfection yielded a 600 

05 bp RNA/PCR product, which was subsequently cloned 
and sequenced, using the methods described above. 
This product contained exon sequences of the al 
cDNA, spanning 171 bp from bp 125-295. This 
represents precisely two exons of the gene, whose 

10 sequence and structure has recently been charac- 
terized. This is proof that accxirate processing 
occurred between tat and ol splice recognition 
sequences, resulting in the removal of the HIV-tat 
and ATPase intron sequences, and the insertion of 

15 ATPase exons in the vector-derived mature RNA. 

EXAMPLE 2 Isolation and Cloning of Multiple Genomic 
DNA Fragments 
Thus, as described in Example 1, it has been 
shown that, in its simplest form, the in vivo 

20 splicing selection method of the present invention 
can be used to amplify exon sequences from indi- 
vidual segments of genomic DNA. However, in situ- 
ations in which large regions of a chromosome 
require analysis in this manner, examination of 

25 single fragments can be extremely cumbersome. 

Therefore, assessment of whether multiple fragments 
could be analyzed simultaneously was also carried 
out, as described in this example. The materials 
and methods used were as described in Example 1. 

30 The Na,K-ATPase al subunit cosmid, MaG#9, was 

digested separately with BamHI, Bglll, or with the 
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combination of BamHI plus Bglll. Each digest was 
subsequently "shotgun" cloned into pSPLl. These 
^mixtures of clones were then transfected into C0S7 
cells and the resulting KNA was analyzed by RNA/PCR 
05 for the presence of products larger than that from 
pSPLl alone. In this situation the predominant RNA 
will contain only sequences from the vector pSPLl, 
since the majority of genomic fragments contain no 
exon sequences or are inserted in the antisense 

10 orientation. PGR analysis of RNA preparations from 
cells transfected with "shotgun" clones of BamHI r 
Bglll, or BamHI plus Bglll digestions of MofG#9 
generated multiple products larger than the 429 bp 
derived from pSPLl. The Bglll RNA/PCR product was 

15 gel purified, radiolabelled and directly hybridized 
to a Bglll restriction digest of MaG#9. Hybridi- 
zation of this product to the 2.8 kbp Bglll genomic 
fragment demonstrated that the amplified product was 
derived from a genomic fragment laiown to contain an 

20 exon. These results indicate that in a situation 
where the conplexity of the genomic DNA is high, 
exon sequences can still be identified in a single 
transfection, The 1.6 kbp product detected 
following transfection with the 3.5 kbp Bglll sense 

25 construct was not observed in the Bglll "shotgun" 
tremsfection RNA/PCR product (s) . This is most 
likely due to competition among PGR templates, 
favoring smaller and more abundant substrates. 
Also, a weakly staining product migrating at 

30 approximately 650 bp was observed in nearly all 
reactions containing RNA from plasmid (including 
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pSPLl alone) transfections and are likely to be 
artificial. 

EXAMPLE 3 Screening of Complex Human Genomic DNA 
To further test the ability of the exon 

05 amplification method to screen complex genomic DNA 
for the presence of exons, genomic clones containing 
15-20 kbp human genomic DNA inserts were analyzed. 
Twelve previously rmcheoracterized humein genomic X 
phage clones, derived from a radiation-reduced 

10 human-hamster hybrid cell line containing a segment 
of human chromosome 19, were digested with BamHI 
plus Bglll, "shotgun" cloned into pSPLl, and trans- 
fected into C0S7 cells. RNA preparations from these 
transfectants were examined by RNA/PCR. six of the 

15 12 amplification reactions designated (IB, 5B, 5C, 
5W, 6B and 6C) cleeurly generated products larger 
than the vector-derived 429 bp product, suggesting 
that exon sequences are present in each of these 
clones. The products from IB (600-620 bp doublet) , 

20 5C (600 bp product) and 5W (620 bp product) were 
excised from agarose gels, ^^P-labelled using the 
random primer method (Peinberg, A. P. et al. . 
Biochem> Biophys. Res, Commun, 111 :47-54 (1983)), 
and hybridized to filters containing blotted DNAs 

25 from the original genomic clones. Each product 
hybridized only to the genomic DNA segment from 
which it was derived, indicating that the amplified 
sequences were not derived from X phage DNA. The 
absence of cross-hybridization to other human DNA 

30 fragments indicated that the PCR products were 

essentially free of repetitive sequences. In some 
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cases, two genomic fragments were detected by these 
probes, suggesting that more than one PGR product 
was present. 

Four of these PGR products were reamplif ied and 

05 cloned using internal oligonucleotides (SDl and SAl) 
which correspond to sequences immediately flanking 
the plasmid splice donor and acceptor sites, and 
^ich contain artificial cloning sites. This was 
followed by cloning into pBluescriptllSK-f • Sequence 

10 analysis of clones from one of these products, 
derived from phage 5W, revealed that the RNA/PCR 
product was derived from an exon of the DNA excision 
repair gene ERCCl. This gene is located on human 
chromosome 19 and is known to be present in the 

15 human-hamster hybrid cell line from which the 

genomic clones were derived. A perfect match of the 
sequence between the HIV tat splice jxinctions and 
bases 136-247 of the ERCCl cONA sequence (van Duin, 
M. et al.. Cell 44:913-923 (1986)) indicates that an 

20 exon of this gene has been rescued. 

EXftMPLE 4 Screening of Uncharacterized Regions of 
the Human Genome 
The present method of exon amplification has 
also been extended to uncharacterized regions of the 

25 human genome. In preliminary studies, approximately 
70% of cosmid genomic clones (23/33) and 45% of X 
phage genomic clones (8/18) have yielded RNA/PCR 
products containing potential exon sequences. cDNAs 
corresponding to 6 of these products are currently 

30 under characterization. These results demonstrate 
the effectiveness of the exon amplification method 
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in the identification of exon sequences in otherwise 
uncharacterized genomic DNA clones. 
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dAIMS 



A method of isolating a coding sequence from 
Bammalian genomic DN&, coinprising the steps of: 

a) providing fragnented naaniialian genomic 
Dm; 

b) inserting fragmented mammalian genomic DNA 
into an intron of an in vivo splicing 
plasmid, thereby producing an in viw 
splicing plasmid construct having 
mammalian genomic DNA within a plasmid 
intron; 

c) introducing the in vivo splicing plasmid 
construct into a host cell in which it 
replicates; 

d) maintaining the product of (c) under 
conditions appropriate for replication of 
the in vivo splicing plasmid construct and 
transcription of DNA present in the in 
vivo splicing plasmid construct, thereby 
producing a host cell containing mature 
BNA; and 

e) isolating mature BNA produced in (d) . 

The method of Claim 1 further comprising the 
steps of: 

f ) amplifying the product obtained in (e) 
using an RNA-based amplification method, 
thereby producing cDNA of the mammalian 
genomic DNA; 

g) digesting the product of (f) with a 
restriction enzyme, which recognizes a 
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restriction site present in a cloning 
vector; and 

h) cloning the product of (g) in the cloning 
vector, thereby producing cloned DNA of 
the maaonalian genomic DNA. 

3. The method of Claim 2 vherein step (f) further 
comprises purifying the cDMA of the mammalian 
genomic DNA. 

4. The method of Claim 2 further comprising 
determining the nucleotide sequence of the 
cloned cDNA of the mammalian genomic DNA. 

5. The method of claim l further comprising the 
step of: 

f ) screening mature RNA obtained in (e) with 
appropriate antisense probes, thereby 
identifying mature RNA containing a 
nuunmalian genomic DNA exon. 

The method of Claim 5 further comprising the 
steps of: 

g) amplifying the product obtained in (e) 
using an RNA-based amplification method, 
thereby producing cDNA of the mammalian 
genomic DNA; 

h) digesting the product of (g) with one or 
more restriction enzymes, which recognize 
a restriction site present in a cloning 
vector; emd 



6. 

20 

25 
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i) cloning the product of (h) in the cloning 
vector, thereby producing cloned DNA of 
the mammalian genomic DNA. 

7. The method of Claim 1 further comprising the 
05 steps of: 

f ) amplifying the product obtained in (e) 
using an SNA-based amplification method, 
thereby producing cDNA of the mammalian 
genomic DNA; 

10 g) purifying cDNA of the mammalian genomic 

DNA; and 

h) sequencing the cDNA obtained in (g) • 

8. The method of Claim 7, wherein step (f) fiirther 
comprises purifying the cDNA and re2unplifying 

15 with PGR primers comprising sequences comple- 

mentary to sequences adjacent to the splice 
sites of the splicing vector. 

9. A method of exon amplification, comprising the 
steps of: 

20 a) providing fragmented mammalian genomic 

DNA; 

b) inserting fragmented mammalian genomic DNA 
into an intron of an in vivo spliring 
plasmid in which the splice sites present 
25 are inefficient or slowly spliced, thereby 

producing an in vivo splicing plasmid 
construct having mammalian genomic DNA 
within a plasmid intron; 
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c) introducing the splicing plasmid construct 
of (b) into a host cell in which it 
replicates; 

d) maintaining the product of (c) under 
conditions appropriate for replication of 
the in vivo splicing plasmid construct and 
transcription of DNA present in the in 
vivo splicing plasmid construct, thereby 
producing a host cell containing mature 

10 SNA; 

e) obtaining mature RNA which includes a 
manmalism genomic exon; 

f ) producing DNA by reverse transcription of 
the mature RNA obtained in (e) ; and 

15 g) amplifying DNA produced in (f ) . 

10. The method of Claim 9 further comprising 
seguencing the product of step (g) . 

11. The method of Claim 9 wherein the mammalian 
genomic DNA is selected from the group con- 

20 sisting of: human DNA and mouse DNA. 

12. A method of exon amplification, comprising the 
steps of: 

a) providing fragmented mammalian genomic 
DNA; 

25 b) inserting fragmented mammalian genomic DNA 

into the in vivo splicing plasmid pSPLl at 
a restriction site in the HIV-tat intron 
present in pSPLl, thereby producing a 
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pSPLl construct having maamalian genomic 
DNA present with the HIV-tat intron; 
c) introducing the pSPLl construct into a 
host cell in which it replicates; 

05 d) maintaining the product of (c) under 

conditions appropriate for replication of 
the pSPIil construct and transcription of 
DMA present in the pSPLI construct, 
thereby producing a host cell containing 

10 mature SNA; 

e) obtaining mature SNA which includes a 
mammalian genomic exon; 

f ) producing DNA by reverse transcription of 
the mature SNA obtained in (e) ; and 

15 g) amplifying DNA produced in (f ) • 

13. The method of Claim 12 further comprising 
secpxencing the product of step (g) • 

14. The method of Claim 12 \rtierein the mammalian 
genomic DNA is selected from the group 

20 consisting of: human DNA and mouse DNA. 

15. The method of Claim 14 wherein the host cell in 
(c) is a C0S7 cell« 

16. An in vivo splicing plasmid comprising; 

a) an origin of replication that is 

25 functional in an appropriate host cell; 

b) a promoter sequence; 
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c) an intron and transcribed flanking 
secpaences under the control of said 
promoter, including therein a 5' splice 
site, a 3' splice site, a branchpoint 
recognition sequence, and a 
polyadenylation signal, the intron 
including one or more cloning sites 5' to 
the bramchpoint recognition sequence; 

d) a bacterial origin of replication; and 
10 e) a selectable marker. 

17. The in vivo splicing plasmid of Claim 16 
wherein the 5' and 3' splice sites allow 
joining with sites inserted into the intron of 
the splicing plasmid. 

15 18. An in vivo splicing plasmid coa^rising: 

a) an SV40 origin of replication and early 
region promoter; and 

b) a ^-globin gene in which the second intron 
has been replaced by an HIV-tat gene 

20 intron and flanking exon sequences, 

including therein the HIV-tat gene 5' 
splice site, 3' splice site, and 
Inranchpoint recognition sequence, the 
HIV-tat gene intron including a cloning 

25 site. 

19. The in vivo splicing plasmid pSPLl. 

20. A kit comprising: 
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a) em in vivo splicing plasmid comprising: 



1) an origin of replication that is 
functional in an appropriate host 
cell; 



05 



2) a promoter sequence; 



10 



3) an intron and transcribed flanking 
sequence under the control of said 
promoter, including therein a 5^ 
splice site, a 3' splice site, a 
branchpoint recognition secpience, and 
a polyadenylation signal, the intron 
including one or more cloning sites 
5' to the branchpoint recognition 



15 



4) 



sequence; 

a bacterial origin of replication; 



and 

5} a selectable marker; and 
b) appropriate oligonucleotide primers which 
hybridize to tremscribed sequences 



21. The kit of Claim 20, wherein the 5' and 3' 

splice sites of the in vivo splicing plasmid 
allow joining with sites inserted into the 
intron of the splicing plasmid. 

25 22. A kit comprising: 

a) an in vivo splicing plasmid comprising: 
1) an SV40 origin of replication and 
early region promoter; and 



20 



flanking said intron. 
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2) a iS-globin gene in which the second 
intron has been replaced by an 
HIV-tat intron and flanking exon 
sequences, including therein the 
05 HIV-tat gene 5' splice site, 3' 

splice site, and branchpoint 
recognition sequence, the HIV-tat 
gene intron including a cloning site; 
and 

10 b) appropriate oligonucleotide primers which 

hybridize to the /3-globin exon sequences 
flanking said HIV-tat intron* 

23. A kit comprising: 

a) the in vivo splicing plasmid pSPLl; and 
15 b) oligonucleotide primers which hybridize to 

transcribed sequences flanking the HIV-tat 
intron within said plasmid. 

24. The kit of Claim 23, wherein the oligonucleo- 
tide primers comprise one or more of the 

20 following: 

a) DHA15, CCAGTGA66AGAAGTCTGCG6 and DHAB14, 
GTGAGCCAGGGCATTGGCC ; 

b) SD2, GTGAACTGCACTGTGACAAGC and SA2, 
ATCTCAGTGGTATTTGTCAGC ; 

25 c) SDl, CCCGGATCCGCGCGACGAAGACCTCCTCAAGGC and 

SAl , CCCGTCGACGTCGGGTCCCCTCGGGATTGG . 
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